Skip to content

feat(build): per-app python_version with cross-resource validation#322

Open
deanq wants to merge 3 commits intomainfrom
deanq/ae-2827-multi-python-versions
Open

feat(build): per-app python_version with cross-resource validation#322
deanq wants to merge 3 commits intomainfrom
deanq/ae-2827-multi-python-versions

Conversation

@deanq
Copy link
Copy Markdown
Member

@deanq deanq commented Apr 21, 2026

Summary

Expand GPU and CPU worker images to support Python 3.10, 3.11, and 3.12. Flash apps ship as one tarball so every resource in an app must share a Python version; the build step now reconciles per-resource python_version declarations into a single app-level value, or accepts an explicit --python-version override on flash build / flash deploy.

This is the SDK half of AE-2827. The flash-worker half (GPU Dockerfile parameterization for side-by-side torch install + CI matrix expansion + worker startup assertion) ships as a sibling PR once image builds can be validated end-to-end.

Changes

  • constants.pyGPU_PYTHON_VERSIONS / CPU_PYTHON_VERSIONS expanded from ("3.12",) to ("3.10", "3.11", "3.12"). DEFAULT_PYTHON_VERSION stays 3.12.
  • build_utils/manifest.py — new _reconcile_python_version(). Stamps the reconciled version onto every resource's target_python_version; removes the now-redundant GPU/CPU special-case that hardcoded 3.12.
  • build.py / deploy.py--python-version CLI flag threaded through run_build into ManifestBuilder. Existing pip-wheel threading via _resolve_pip_python_version keeps working.
  • docs/Flash_Deploy_Guide.md — new "Python version selection" section documenting per-resource declarations, the app-level override, the ~7 GB GPU cold-start tax for non-3.12, and the 3.10 EOL (2026-10-31) warning.
  • tests/unit/test_dotenv_loading.py — add preserve_runpod_flash_modules fixture. Two existing tests force-reload runpod_flash via del sys.modules[...] without restoring, leaking stale module references into sibling test files and causing flakes in TestRemoteClassWrapperPickle. The fixture snapshots and restores. Pre-existing main had 22 order-dependent failures in a -p no:randomly run; this branch now has 0 in the pickle cluster.

Reconciliation rules

Resolution order when building the manifest:

  1. Explicit --python-version override (validated against SUPPORTED_PYTHON_VERSIONS)
  2. Exactly one distinct python_version declared across resource configs
  3. DEFAULT_PYTHON_VERSION when no resource declares one

Raises ValueError when resources declare conflicting versions, or when the override conflicts with a resource's explicit declaration.

Test plan

  • make quality-check passes (85.7% coverage)
  • 9 new TestReconcilePythonVersion cases cover override/conflict/default paths
  • Parametrized 3.10 / 3.11 / 3.12 across test_constants.py + test_live_serverless.py
  • Test isolation fix verified: pytest -p no:randomly tests/unit/ pickle-cluster failures drop from 6 → 0
  • Follow-up PR: flash-worker GPU Dockerfile side-by-side torch install + CI matrix + worker startup assertion
  • Integration: end-to-end deploy of a 3.11-targeted FlashApp once the flash-worker PR lands

@promptless
Copy link
Copy Markdown

promptless Bot commented Apr 21, 2026

Promptless prepared a documentation update related to this change.

Triggered by runpod/flash#322

This PR introduces per-app Python version selection for Flash, allowing users to target Python 3.10, 3.11, or 3.12. The docs update covers the new --python-version CLI flag, per-resource python_version configuration, reconciliation rules, and cold-start performance implications for GPU workers.

Review: Document per-app Python version selection for Flash

@deanq deanq changed the title feat(build): per-app python_version with cross-resource validation (AE-2827) feat(build): per-app python_version with cross-resource validation Apr 21, 2026
@deanq deanq force-pushed the deanq/ae-2827-multi-python-versions branch from 2548f64 to 6138ae2 Compare April 22, 2026 04:32
Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Review — PR 322

Tested against v1.14.0 baseline. Four findings.


Finding 1: SDK ships before worker images exist

GPU_PYTHON_VERSIONS and CPU_PYTHON_VERSIONS now include "3.10" and "3.11", but these image tags don't exist until a separate flash-worker PR merges. A user who deploys with --python-version 3.10 or --python-version 3.11 today will silently receive the 3.12 image (or the deploy will fail with an opaque image-pull error), depending on how RunPod handles missing tags. The CLI accepts the flag and emits no warning.

Question: Is there a plan to block --python-version 3.10 and --python-version 3.11 at the CLI until the worker images exist? Or will this ship as-is?


Finding 2: First deploy after SDK upgrade may trigger a spurious rolling release

python_version is now extracted from resource configs and stamped into the manifest (via target_python_version). For any project that had python_version set on a resource before this PR, the manifest fingerprint will differ from the deployed fingerprint even though the user changed nothing. This will trigger the rolling release recycle on first deploy after upgrade.

Whether this is acceptable depends on whether the behaviour is documented. Nothing in the changelog or the _reconcile_python_version docstring mentions it.

Question: Is this a known and acceptable tradeoff? If so, it should be called out in the release notes.


Finding 3: self.python_version is set twice in ManifestBuilder

In manifest.py, self.python_version is assigned in __init__ and then overwritten inside build(). Any code that reads self.python_version between construction and build() would see the wrong value. Currently nothing does, but the pattern is fragile. The __init__ assignment is dead assignment — if build() always overwrites it, the __init__ line should be removed.


Finding 4: Isolation fix is scoped to two tests only

preserve_runpod_flash_modules is added to two tests in test_dotenv_loading.py. If the same dotenv re-import issue surfaces in other test files (the underlying cause is module-level side effects in runpod_flash/__init__.py), those tests will still be fragile. This fix is targeted, not structural — that's fine for now but worth noting if isolation artifacts appear elsewhere.


No blockers on the core logic. _reconcile_python_version() reads cleanly, the validation cases are thorough (9 new tests cover the main paths), and the CLI flag wiring is straightforward.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Flash’s worker-image Python version support to 3.10/3.11/3.12 and adds app-level reconciliation/validation so a single deploy artifact (one tarball) can’t mix Python ABIs across resources.

Changes:

  • Expand supported GPU/CPU Python versions to 3.10, 3.11, and 3.12, keeping 3.12 as the default.
  • Add manifest reconciliation to enforce a single app-level Python version (with --python-version override support in flash build / flash deploy) and stamp a consistent target_python_version per resource.
  • Update unit tests and docs to reflect the new version-selection behavior and improve test isolation around module reloading.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/runpod_flash/core/resources/constants.py Expands supported Python versions and updates related documentation/comments.
src/runpod_flash/cli/commands/build_utils/manifest.py Adds cross-resource python_version reconciliation and stamps target_python_version consistently.
src/runpod_flash/cli/commands/build.py Threads --python-version through build to the manifest builder and uses manifest for wheel ABI selection.
src/runpod_flash/cli/commands/deploy.py Adds --python-version flag and passes it through to build.
src/runpod_flash/cli/docs/flash-build.md Documents --python-version (but currently also documents flags not implemented by the CLI).
src/runpod_flash/cli/docs/flash-deploy.md Documents --python-version for deploy.
docs/Flash_Deploy_Guide.md Adds end-user guidance on Python version selection and tradeoffs.
docs/Deployment_Architecture.md Updates architecture notes to reflect targeting the app’s python_version for wheel selection.
tests/unit/cli/commands/build_utils/test_manifest.py Adds unit tests covering reconciliation rules and error cases.
tests/unit/core/resources/test_constants.py Updates assertions for the expanded version sets.
tests/unit/resources/test_live_serverless.py Updates live resource tests to accept the expanded supported versions.
tests/unit/test_dotenv_loading.py Adds a fixture to prevent sys.modules cleanup from leaking across tests (flake reduction).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +29 to +33
@@ -30,6 +30,7 @@ flash build [OPTIONS]
- `--output, -o`: Custom archive name (default: artifact.tar.gz)
- `--exclude`: Comma-separated packages to exclude (e.g., 'torch,torchvision')
- `--preview`: Launch local test environment after successful build (auto-enables `--keep-build`)
- `--python-version`: Target Python version for worker images (`3.10`, `3.11`, or `3.12`). Overrides per-resource `python_version`. Default: value declared on resource configs, or 3.12 if none set.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flash build docs list --keep-build and --preview, but build_command in cli/commands/build.py does not expose either option (and the build directory is currently always kept on success). Please either implement these flags or update the docs to match the actual CLI behavior (and how --preview is supported via flash deploy).

Copilot uses AI. Check for mistakes.
deanq added 2 commits April 24, 2026 02:44
…E-2827)

Expand GPU and CPU worker images to support Python 3.10, 3.11, and 3.12.
Flash apps ship as one tarball so every resource must share a Python
version; the build step now reconciles per-resource python_version
declarations into a single app-level value or accepts an explicit
--python-version override.

- constants.py: expand GPU_PYTHON_VERSIONS / CPU_PYTHON_VERSIONS tuples
- manifest.py: add _reconcile_python_version; stamp target_python_version
  on every resource; raise on conflicting declarations
- build.py / deploy.py: add --python-version CLI flag, thread through
  run_build and ManifestBuilder
- docs: document per-app python_version, cold-start tradeoff for non-3.12
  GPU images, and the 3.10 EOL window
- tests/unit/test_dotenv_loading.py: add preserve_runpod_flash_modules
  fixture so module-deletion tests don't leak stale module references
  into sibling test files (unblocks deterministic test ordering)
The effective app-level Python version is assigned by build() via
_reconcile_python_version(). Pre-build access always saw a value that
build() would overwrite — misleading and never read. Initialize to None
so readers fail fast if they touch the attribute before build().
@deanq deanq force-pushed the deanq/ae-2827-multi-python-versions branch from 4b93537 to ff6aed9 Compare April 24, 2026 09:46
Neither flag is exposed by flash/cli/commands/build.py::build_command;
--preview now lives on flash deploy. Aligns the docs with actual CLI
behavior — build directory is always retained on success, so the
--keep-build toggle is gone.

Caught by Copilot review on PR #322.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants